{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1c0beb97",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/output_parsing/evaporate_program.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8cd3f128-866a-4857-a00a-df19f926c952",
   "metadata": {},
   "source": [
    "# Evaporate Demo\n",
    "\n",
    "This demo shows how you can extract DataFrame from raw text using the Evaporate paper (Arora et al.): https://arxiv.org/abs/2304.09433.\n",
    "\n",
    "The inspiration is to first \"fit\" on a set of training text. The fitting process uses the LLM to generate a set of parsing functions from the text.\n",
    "These fitted functions are then applied to text during inference time."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7c152e41",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6cc721a",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-llms-openai\n",
    "%pip install llama-index-program-evaporate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3096cbad",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "db7210f2-f19d-4112-ab72-ddb3afe282f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "da19d340-57b5-439f-9cb1-5ba9576ec304",
   "metadata": {},
   "source": [
    "## Use `DFEvaporateProgram` \n",
    "\n",
    "The `DFEvaporateProgram` will extract a 2D dataframe from a set of datapoints given a set of fields, and some training data to \"fit\" some functions on."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a299cad8-af81-4974-a3de-ed43877d3490",
   "metadata": {},
   "source": [
    "### Load data\n",
    "\n",
    "Here we load a set of cities from Wikipedia."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "daf434f6-3b27-4805-9de8-8fc92d7d776b",
   "metadata": {},
   "outputs": [],
   "source": [
    "wiki_titles = [\"Toronto\", \"Seattle\", \"Chicago\", \"Boston\", \"Houston\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8438168c-3b1b-425e-98b0-2c67a8a58a5f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "import requests\n",
    "\n",
    "for title in wiki_titles:\n",
    "    response = requests.get(\n",
    "        \"https://en.wikipedia.org/w/api.php\",\n",
    "        params={\n",
    "            \"action\": \"query\",\n",
    "            \"format\": \"json\",\n",
    "            \"titles\": title,\n",
    "            \"prop\": \"extracts\",\n",
    "            # 'exintro': True,\n",
    "            \"explaintext\": True,\n",
    "        },\n",
    "    ).json()\n",
    "    page = next(iter(response[\"query\"][\"pages\"].values()))\n",
    "    wiki_text = page[\"extract\"]\n",
    "\n",
    "    data_path = Path(\"data\")\n",
    "    if not data_path.exists():\n",
    "        Path.mkdir(data_path)\n",
    "\n",
    "    with open(data_path / f\"{title}.txt\", \"w\") as fp:\n",
    "        fp.write(wiki_text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c01dbcb8-5ea1-4e76-b5de-ea5ebe4f0392",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import SimpleDirectoryReader\n",
    "\n",
    "# Load all wiki documents\n",
    "city_docs = {}\n",
    "for wiki_title in wiki_titles:\n",
    "    city_docs[wiki_title] = SimpleDirectoryReader(\n",
    "        input_files=[f\"data/{wiki_title}.txt\"]\n",
    "    ).load_data()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e7310883-2aeb-4a4d-b101-b3279e670ea8",
   "metadata": {},
   "source": [
    "### Parse Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8e98279-b4c4-41ec-b696-13e6a6f841a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.openai import OpenAI\n",
    "from llama_index.core import Settings\n",
    "\n",
    "# setup settings\n",
    "Settings.llm = OpenAI(temperature=0, model=\"gpt-3.5-turbo\")\n",
    "Settings.chunk_size = 512"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "74c6c1c3-b797-45c8-b692-7a6e4bd1898d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# get nodes for each document\n",
    "city_nodes = {}\n",
    "for wiki_title in wiki_titles:\n",
    "    docs = city_docs[wiki_title]\n",
    "    nodes = Settings.node_parser.get_nodes_from_documents(docs)\n",
    "    city_nodes[wiki_title] = nodes"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "bb369a78-e634-43f4-805e-52f6ea0f3588",
   "metadata": {},
   "source": [
    "### Running the DFEvaporateProgram\n",
    "\n",
    "Here we demonstrate how to extract datapoints with our `DFEvaporateProgram`. Given a set of fields, the `DFEvaporateProgram` can first fit functions on a set of training data, and then run extraction over inference data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6c260836",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.program.evaporate import DFEvaporateProgram\n",
    "\n",
    "# define program\n",
    "program = DFEvaporateProgram.from_defaults(\n",
    "    fields_to_extract=[\"population\"],\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c548768e-9d4a-4708-9c84-9266503edf01",
   "metadata": {},
   "source": [
    "### Fitting Functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6c186eb7-116f-4b28-a508-8639cbc86633",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'population': 'def get_population_field(text: str):\\n    \"\"\"\\n    Function to extract population. \\n    \"\"\"\\n    \\n    # Use regex to find the population field\\n    pattern = r\\'(?<=population of )(\\\\d+,?\\\\d*)\\'\\n    population_field = re.search(pattern, text).group(1)\\n    \\n    # Return the population field as a single value\\n    return int(population_field.replace(\\',\\', \\'\\'))'}"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "program.fit_fields(city_nodes[\"Toronto\"][:1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "483676a4-4937-40a8-acd9-8fec4a991270",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "def get_population_field(text: str):\n",
      "    \"\"\"\n",
      "    Function to extract population. \n",
      "    \"\"\"\n",
      "    \n",
      "    # Use regex to find the population field\n",
      "    pattern = r'(?<=population of )(\\d+,?\\d*)'\n",
      "    population_field = re.search(pattern, text).group(1)\n",
      "    \n",
      "    # Return the population field as a single value\n",
      "    return int(population_field.replace(',', ''))\n"
     ]
    }
   ],
   "source": [
    "# view extracted function\n",
    "print(program.get_function_str(\"population\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "508a442c-d7d8-4a27-8add-1d58f1ecc66b",
   "metadata": {},
   "source": [
    "### Run Inference"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83e38b62-bad0-4154-9597-555a27e976d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "seattle_df = program(nodes=city_nodes[\"Seattle\"][:1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dc72f611-da8b-4882-b532-69e46b9589bb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DataFrameRowsOnly(rows=[DataFrameRow(row_values=[749256])])"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "seattle_df"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "9465ba41-8318-40bb-a202-49df6e3c16e3",
   "metadata": {},
   "source": [
    "## Use `MultiValueEvaporateProgram` \n",
    "\n",
    "In contrast to the `DFEvaporateProgram`, which assumes the output obeys a 2D tabular format (one row per node), the `MultiValueEvaporateProgram` returns a list of `DataFrameRow` objects - each object corresponds to a column, and can contain a variable length of values. This can help if we want to extract multiple values for one field from a given piece of text.\n",
    "\n",
    "In this example, we use this program to parse gold medal counts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c3d5e9dd-0d20-447b-96b2-a82f8350e430",
   "metadata": {},
   "outputs": [],
   "source": [
    "Settings.llm = OpenAI(temperature=0, model=\"gpt-4\")\n",
    "Settings.chunk_size = 1024\n",
    "Settings.chunk_overlap = 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "08b44698-4f7e-4686-9b6e-1b77c341a778",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.data_structs import Node\n",
    "\n",
    "# Olympic total medal counts: https://en.wikipedia.org/wiki/All-time_Olympic_Games_medal_table\n",
    "\n",
    "train_text = \"\"\"\n",
    "<table class=\"wikitable sortable\" style=\"margin-top:0; text-align:center; font-size:90%;\">\n",
    "\n",
    "<tbody><tr>\n",
    "<th>Team (IOC code)\n",
    "</th>\n",
    "<th>No. Summer\n",
    "</th>\n",
    "<th>No. Winter\n",
    "</th>\n",
    "<th>No. Games\n",
    "</th></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"ALB\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/3/36/Flag_of_Albania.svg/22px-Flag_of_Albania.svg.png\" decoding=\"async\" width=\"22\" height=\"16\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/3/36/Flag_of_Albania.svg/33px-Flag_of_Albania.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/3/36/Flag_of_Albania.svg/44px-Flag_of_Albania.svg.png 2x\" data-file-width=\"980\" data-file-height=\"700\" />&#160;<a href=\"/wiki/Albania_at_the_Olympics\" title=\"Albania at the Olympics\">Albania</a>&#160;<span style=\"font-size:90%;\">(ALB)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">9</td>\n",
    "<td style=\"background:#cedff2;\">5</td>\n",
    "<td>14\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"ASA\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/8/87/Flag_of_American_Samoa.svg/22px-Flag_of_American_Samoa.svg.png\" decoding=\"async\" width=\"22\" height=\"11\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/8/87/Flag_of_American_Samoa.svg/33px-Flag_of_American_Samoa.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/87/Flag_of_American_Samoa.svg/44px-Flag_of_American_Samoa.svg.png 2x\" data-file-width=\"1000\" data-file-height=\"500\" />&#160;<a href=\"/wiki/American_Samoa_at_the_Olympics\" title=\"American Samoa at the Olympics\">American Samoa</a>&#160;<span style=\"font-size:90%;\">(ASA)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">9</td>\n",
    "<td style=\"background:#cedff2;\">2</td>\n",
    "<td>11\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"AND\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/1/19/Flag_of_Andorra.svg/22px-Flag_of_Andorra.svg.png\" decoding=\"async\" width=\"22\" height=\"15\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/1/19/Flag_of_Andorra.svg/33px-Flag_of_Andorra.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/1/19/Flag_of_Andorra.svg/44px-Flag_of_Andorra.svg.png 2x\" data-file-width=\"1000\" data-file-height=\"700\" />&#160;<a href=\"/wiki/Andorra_at_the_Olympics\" title=\"Andorra at the Olympics\">Andorra</a>&#160;<span style=\"font-size:90%;\">(AND)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">12</td>\n",
    "<td style=\"background:#cedff2;\">13</td>\n",
    "<td>25\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"ANG\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/9/9d/Flag_of_Angola.svg/22px-Flag_of_Angola.svg.png\" decoding=\"async\" width=\"22\" height=\"15\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/9/9d/Flag_of_Angola.svg/33px-Flag_of_Angola.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/9/9d/Flag_of_Angola.svg/44px-Flag_of_Angola.svg.png 2x\" data-file-width=\"900\" data-file-height=\"600\" />&#160;<a href=\"/wiki/Angola_at_the_Olympics\" title=\"Angola at the Olympics\">Angola</a>&#160;<span style=\"font-size:90%;\">(ANG)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">10</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>10\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"ANT\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/8/89/Flag_of_Antigua_and_Barbuda.svg/22px-Flag_of_Antigua_and_Barbuda.svg.png\" decoding=\"async\" width=\"22\" height=\"15\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/8/89/Flag_of_Antigua_and_Barbuda.svg/33px-Flag_of_Antigua_and_Barbuda.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/89/Flag_of_Antigua_and_Barbuda.svg/44px-Flag_of_Antigua_and_Barbuda.svg.png 2x\" data-file-width=\"900\" data-file-height=\"600\" />&#160;<a href=\"/wiki/Antigua_and_Barbuda_at_the_Olympics\" title=\"Antigua and Barbuda at the Olympics\">Antigua and Barbuda</a>&#160;<span style=\"font-size:90%;\">(ANT)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">11</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>11\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"ARU\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/f/f6/Flag_of_Aruba.svg/22px-Flag_of_Aruba.svg.png\" decoding=\"async\" width=\"22\" height=\"15\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/f/f6/Flag_of_Aruba.svg/33px-Flag_of_Aruba.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/f/f6/Flag_of_Aruba.svg/44px-Flag_of_Aruba.svg.png 2x\" data-file-width=\"900\" data-file-height=\"600\" />&#160;<a href=\"/wiki/Aruba_at_the_Olympics\" title=\"Aruba at the Olympics\">Aruba</a>&#160;<span style=\"font-size:90%;\">(ARU)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">9</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>9\n",
    "</td></tr>\n",
    "\"\"\"\n",
    "train_nodes = [Node(text=train_text)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fa6636a0-aa33-43ae-8ec2-c706fba693ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "infer_text = \"\"\"\n",
    "<td align=\"left\"><span id=\"BAN\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/f/f9/Flag_of_Bangladesh.svg/22px-Flag_of_Bangladesh.svg.png\" decoding=\"async\" width=\"22\" height=\"13\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/f/f9/Flag_of_Bangladesh.svg/33px-Flag_of_Bangladesh.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/f/f9/Flag_of_Bangladesh.svg/44px-Flag_of_Bangladesh.svg.png 2x\" data-file-width=\"1000\" data-file-height=\"600\" />&#160;<a href=\"/wiki/Bangladesh_at_the_Olympics\" title=\"Bangladesh at the Olympics\">Bangladesh</a>&#160;<span style=\"font-size:90%;\">(BAN)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">10</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>10\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"BIZ\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Flag_of_Belize.svg/22px-Flag_of_Belize.svg.png\" decoding=\"async\" width=\"22\" height=\"13\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Flag_of_Belize.svg/33px-Flag_of_Belize.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Flag_of_Belize.svg/44px-Flag_of_Belize.svg.png 2x\" data-file-width=\"1000\" data-file-height=\"600\" />&#160;<a href=\"/wiki/Belize_at_the_Olympics\" title=\"Belize at the Olympics\">Belize</a>&#160;<span style=\"font-size:90%;\">(BIZ)</span></span> <sup class=\"reference\" id=\"ref_BIZBIZ\"><a href=\"#endnote_BIZBIZ\">[BIZ]</a></sup>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">13</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>13\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"BEN\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/0/0a/Flag_of_Benin.svg/22px-Flag_of_Benin.svg.png\" decoding=\"async\" width=\"22\" height=\"15\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/0/0a/Flag_of_Benin.svg/33px-Flag_of_Benin.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/0/0a/Flag_of_Benin.svg/44px-Flag_of_Benin.svg.png 2x\" data-file-width=\"900\" data-file-height=\"600\" />&#160;<a href=\"/wiki/Benin_at_the_Olympics\" title=\"Benin at the Olympics\">Benin</a>&#160;<span style=\"font-size:90%;\">(BEN)</span></span> <sup class=\"reference\" id=\"ref_BENBEN\"><a href=\"#endnote_BENBEN\">[BEN]</a></sup>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">12</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>12\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"BHU\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/9/91/Flag_of_Bhutan.svg/22px-Flag_of_Bhutan.svg.png\" decoding=\"async\" width=\"22\" height=\"15\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/9/91/Flag_of_Bhutan.svg/33px-Flag_of_Bhutan.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/9/91/Flag_of_Bhutan.svg/44px-Flag_of_Bhutan.svg.png 2x\" data-file-width=\"900\" data-file-height=\"600\" />&#160;<a href=\"/wiki/Bhutan_at_the_Olympics\" title=\"Bhutan at the Olympics\">Bhutan</a>&#160;<span style=\"font-size:90%;\">(BHU)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">10</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>10\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"BOL\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/4/48/Flag_of_Bolivia.svg/22px-Flag_of_Bolivia.svg.png\" decoding=\"async\" width=\"22\" height=\"15\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/4/48/Flag_of_Bolivia.svg/33px-Flag_of_Bolivia.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/4/48/Flag_of_Bolivia.svg/44px-Flag_of_Bolivia.svg.png 2x\" data-file-width=\"1100\" data-file-height=\"750\" />&#160;<a href=\"/wiki/Bolivia_at_the_Olympics\" title=\"Bolivia at the Olympics\">Bolivia</a>&#160;<span style=\"font-size:90%;\">(BOL)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">15</td>\n",
    "<td style=\"background:#cedff2;\">7</td>\n",
    "<td>22\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"BIH\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/b/bf/Flag_of_Bosnia_and_Herzegovina.svg/22px-Flag_of_Bosnia_and_Herzegovina.svg.png\" decoding=\"async\" width=\"22\" height=\"11\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/b/bf/Flag_of_Bosnia_and_Herzegovina.svg/33px-Flag_of_Bosnia_and_Herzegovina.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/b/bf/Flag_of_Bosnia_and_Herzegovina.svg/44px-Flag_of_Bosnia_and_Herzegovina.svg.png 2x\" data-file-width=\"800\" data-file-height=\"400\" />&#160;<a href=\"/wiki/Bosnia_and_Herzegovina_at_the_Olympics\" title=\"Bosnia and Herzegovina at the Olympics\">Bosnia and Herzegovina</a>&#160;<span style=\"font-size:90%;\">(BIH)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">8</td>\n",
    "<td style=\"background:#cedff2;\">8</td>\n",
    "<td>16\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"IVB\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/4/42/Flag_of_the_British_Virgin_Islands.svg/22px-Flag_of_the_British_Virgin_Islands.svg.png\" decoding=\"async\" width=\"22\" height=\"11\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/4/42/Flag_of_the_British_Virgin_Islands.svg/33px-Flag_of_the_British_Virgin_Islands.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/4/42/Flag_of_the_British_Virgin_Islands.svg/44px-Flag_of_the_British_Virgin_Islands.svg.png 2x\" data-file-width=\"1200\" data-file-height=\"600\" />&#160;<a href=\"/wiki/British_Virgin_Islands_at_the_Olympics\" title=\"British Virgin Islands at the Olympics\">British Virgin Islands</a>&#160;<span style=\"font-size:90%;\">(IVB)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">10</td>\n",
    "<td style=\"background:#cedff2;\">2</td>\n",
    "<td>12\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"BRU\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/9/9c/Flag_of_Brunei.svg/22px-Flag_of_Brunei.svg.png\" decoding=\"async\" width=\"22\" height=\"11\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/9/9c/Flag_of_Brunei.svg/33px-Flag_of_Brunei.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/9/9c/Flag_of_Brunei.svg/44px-Flag_of_Brunei.svg.png 2x\" data-file-width=\"1440\" data-file-height=\"720\" />&#160;<a href=\"/wiki/Brunei_at_the_Olympics\" title=\"Brunei at the Olympics\">Brunei</a>&#160;<span style=\"font-size:90%;\">(BRU)</span></span> <sup class=\"reference\" id=\"ref_AA\"><a href=\"#endnote_AA\">[A]</a></sup>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">6</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>6\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"CAM\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/8/83/Flag_of_Cambodia.svg/22px-Flag_of_Cambodia.svg.png\" decoding=\"async\" width=\"22\" height=\"14\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/8/83/Flag_of_Cambodia.svg/33px-Flag_of_Cambodia.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/83/Flag_of_Cambodia.svg/44px-Flag_of_Cambodia.svg.png 2x\" data-file-width=\"1000\" data-file-height=\"640\" />&#160;<a href=\"/wiki/Cambodia_at_the_Olympics\" title=\"Cambodia at the Olympics\">Cambodia</a>&#160;<span style=\"font-size:90%;\">(CAM)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">10</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>10\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"CPV\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/3/38/Flag_of_Cape_Verde.svg/22px-Flag_of_Cape_Verde.svg.png\" decoding=\"async\" width=\"22\" height=\"13\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/3/38/Flag_of_Cape_Verde.svg/33px-Flag_of_Cape_Verde.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/3/38/Flag_of_Cape_Verde.svg/44px-Flag_of_Cape_Verde.svg.png 2x\" data-file-width=\"1020\" data-file-height=\"600\" />&#160;<a href=\"/wiki/Cape_Verde_at_the_Olympics\" title=\"Cape Verde at the Olympics\">Cape Verde</a>&#160;<span style=\"font-size:90%;\">(CPV)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">7</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>7\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"CAY\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/0/0f/Flag_of_the_Cayman_Islands.svg/22px-Flag_of_the_Cayman_Islands.svg.png\" decoding=\"async\" width=\"22\" height=\"11\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/0/0f/Flag_of_the_Cayman_Islands.svg/33px-Flag_of_the_Cayman_Islands.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/0/0f/Flag_of_the_Cayman_Islands.svg/44px-Flag_of_the_Cayman_Islands.svg.png 2x\" data-file-width=\"1200\" data-file-height=\"600\" />&#160;<a href=\"/wiki/Cayman_Islands_at_the_Olympics\" title=\"Cayman Islands at the Olympics\">Cayman Islands</a>&#160;<span style=\"font-size:90%;\">(CAY)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">11</td>\n",
    "<td style=\"background:#cedff2;\">2</td>\n",
    "<td>13\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"CAF\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Flag_of_the_Central_African_Republic.svg/22px-Flag_of_the_Central_African_Republic.svg.png\" decoding=\"async\" width=\"22\" height=\"15\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Flag_of_the_Central_African_Republic.svg/33px-Flag_of_the_Central_African_Republic.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Flag_of_the_Central_African_Republic.svg/44px-Flag_of_the_Central_African_Republic.svg.png 2x\" data-file-width=\"900\" data-file-height=\"600\" />&#160;<a href=\"/wiki/Central_African_Republic_at_the_Olympics\" title=\"Central African Republic at the Olympics\">Central African Republic</a>&#160;<span style=\"font-size:90%;\">(CAF)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">11</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>11\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"CHA\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Flag_of_Chad.svg/22px-Flag_of_Chad.svg.png\" decoding=\"async\" width=\"22\" height=\"15\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Flag_of_Chad.svg/33px-Flag_of_Chad.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Flag_of_Chad.svg/44px-Flag_of_Chad.svg.png 2x\" data-file-width=\"900\" data-file-height=\"600\" />&#160;<a href=\"/wiki/Chad_at_the_Olympics\" title=\"Chad at the Olympics\">Chad</a>&#160;<span style=\"font-size:90%;\">(CHA)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">13</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>13\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"COM\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/9/94/Flag_of_the_Comoros.svg/22px-Flag_of_the_Comoros.svg.png\" decoding=\"async\" width=\"22\" height=\"13\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/9/94/Flag_of_the_Comoros.svg/33px-Flag_of_the_Comoros.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/9/94/Flag_of_the_Comoros.svg/44px-Flag_of_the_Comoros.svg.png 2x\" data-file-width=\"1000\" data-file-height=\"600\" />&#160;<a href=\"/wiki/Comoros_at_the_Olympics\" title=\"Comoros at the Olympics\">Comoros</a>&#160;<span style=\"font-size:90%;\">(COM)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">7</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>7\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"CGO\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/9/92/Flag_of_the_Republic_of_the_Congo.svg/22px-Flag_of_the_Republic_of_the_Congo.svg.png\" decoding=\"async\" width=\"22\" height=\"15\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/9/92/Flag_of_the_Republic_of_the_Congo.svg/33px-Flag_of_the_Republic_of_the_Congo.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/9/92/Flag_of_the_Republic_of_the_Congo.svg/44px-Flag_of_the_Republic_of_the_Congo.svg.png 2x\" data-file-width=\"900\" data-file-height=\"600\" />&#160;<a href=\"/wiki/Republic_of_the_Congo_at_the_Olympics\" title=\"Republic of the Congo at the Olympics\">Republic of the Congo</a>&#160;<span style=\"font-size:90%;\">(CGO)</span></span>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">13</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>13\n",
    "</td></tr>\n",
    "<tr>\n",
    "<td align=\"left\"><span id=\"COD\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Flag_of_the_Democratic_Republic_of_the_Congo.svg/22px-Flag_of_the_Democratic_Republic_of_the_Congo.svg.png\" decoding=\"async\" width=\"22\" height=\"17\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Flag_of_the_Democratic_Republic_of_the_Congo.svg/33px-Flag_of_the_Democratic_Republic_of_the_Congo.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Flag_of_the_Democratic_Republic_of_the_Congo.svg/44px-Flag_of_the_Democratic_Republic_of_the_Congo.svg.png 2x\" data-file-width=\"800\" data-file-height=\"600\" />&#160;<a href=\"/wiki/Democratic_Republic_of_the_Congo_at_the_Olympics\" title=\"Democratic Republic of the Congo at the Olympics\">Democratic Republic of the Congo</a>&#160;<span style=\"font-size:90%;\">(COD)</span></span> <sup class=\"reference\" id=\"ref_CODCOD\"><a href=\"#endnote_CODCOD\">[COD]</a></sup>\n",
    "</td>\n",
    "<td style=\"background:#f2f2ce;\">11</td>\n",
    "<td style=\"background:#cedff2;\">0</td>\n",
    "<td>11\n",
    "</td></tr>\n",
    "\"\"\"\n",
    "\n",
    "infer_nodes = [Node(text=infer_text)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fb6c6ab1-b56f-4774-9eb9-49c9a04c7cd3",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.program.predefined import MultiValueEvaporateProgram\n",
    "\n",
    "program = MultiValueEvaporateProgram.from_defaults(\n",
    "    fields_to_extract=[\"countries\", \"medal_count\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b516f9a6-bff3-41d7-9efe-1daf9f89d251",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'countries': 'def get_countries_field(text: str):\\n    \"\"\"\\n    Function to extract countries. \\n    \"\"\"\\n    \\n    # Use regex to extract the countries field\\n    countries_field = re.findall(r\\'<a href=\".*\">(.*)</a>\\', text)\\n    \\n    # Return the result as a list\\n    return countries_field',\n",
       " 'medal_count': 'def get_medal_count_field(text: str):\\n    \"\"\"\\n    Function to extract medal_count. \\n    \"\"\"\\n    \\n    # Use regex to extract the medal count field\\n    medal_count_field = re.findall(r\\'<td style=\"background:#f2f2ce;\">(.*?)</td>\\', text)\\n    \\n    # Return the result as a list\\n    return medal_count_field'}"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "program.fit_fields(train_nodes[:1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cc32440c-910a-483c-81df-80ae81fedb2d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "def get_countries_field(text: str):\n",
      "    \"\"\"\n",
      "    Function to extract countries. \n",
      "    \"\"\"\n",
      "    \n",
      "    # Use regex to extract the countries field\n",
      "    countries_field = re.findall(r'<a href=\".*\">(.*)</a>', text)\n",
      "    \n",
      "    # Return the result as a list\n",
      "    return countries_field\n"
     ]
    }
   ],
   "source": [
    "print(program.get_function_str(\"countries\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ed16aa9-8b36-439a-a596-1b90d6775a30",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "def get_medal_count_field(text: str):\n",
      "    \"\"\"\n",
      "    Function to extract medal_count. \n",
      "    \"\"\"\n",
      "    \n",
      "    # Use regex to extract the medal count field\n",
      "    medal_count_field = re.findall(r'<td style=\"background:#f2f2ce;\">(.*?)</td>', text)\n",
      "    \n",
      "    # Return the result as a list\n",
      "    return medal_count_field\n"
     ]
    }
   ],
   "source": [
    "print(program.get_function_str(\"medal_count\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f7bae5f-ee4e-4d9f-b551-1986efd317b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "result = program(nodes=infer_nodes[:1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85bc4f9c-9e6c-41da-b6fb-a8b227b3ce67",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Countries: ['Bangladesh', '[BIZ]', '[BEN]', 'Bhutan', 'Bolivia', 'Bosnia and Herzegovina', 'British Virgin Islands', '[A]', 'Cambodia', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Comoros', 'Republic of the Congo', '[COD]']\n",
      "\n",
      "Medal Counts: ['Bangladesh', '[BIZ]', '[BEN]', 'Bhutan', 'Bolivia', 'Bosnia and Herzegovina', 'British Virgin Islands', '[A]', 'Cambodia', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Comoros', 'Republic of the Congo', '[COD]']\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# output countries\n",
    "print(f\"Countries: {result.columns[0].row_values}\\n\")\n",
    "# output medal counts\n",
    "print(f\"Medal Counts: {result.columns[0].row_values}\\n\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "820768fe-aa23-4999-bcc1-102e6fc817e5",
   "metadata": {},
   "source": [
    "## Bonus: Use the underlying `EvaporateExtractor`\n",
    "\n",
    "The underlying `EvaporateExtractor` offers some additional functionality, e.g. actually helping to identify fields over a set of text.\n",
    "\n",
    "Here we show how you can use `identify_fields` to determine relevant fields around a general `topic` field."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ff32b4b-a85b-4266-bdf1-7fa492925034",
   "metadata": {},
   "outputs": [],
   "source": [
    "# a list of nodes, one node per city, corresponding to intro paragraph\n",
    "# city_pop_nodes = []\n",
    "city_pop_nodes = [city_nodes[\"Toronto\"][0], city_nodes[\"Seattle\"][0]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dc96646f-ac7e-407f-87dd-c14c8d83aa84",
   "metadata": {},
   "outputs": [],
   "source": [
    "extractor = program.extractor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1df3a7df-6d00-4487-b114-f45a6dba4764",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Try with Toronto and Seattle (should extract \"population\")\n",
    "existing_fields = extractor.identify_fields(\n",
    "    city_pop_nodes, topic=\"population\", fields_top_k=4\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8a56bb6-3a26-40db-9ca3-8aa9ed4f2c52",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[\"seattle metropolitan area's population\"]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "existing_fields"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_index_v2",
   "language": "python",
   "name": "llama_index_v2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
